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Abstract: Based on the point of view of neuroethology and cognition-psychology, general frame 
of theory for intelligent systems is presented by means of principle of relative entropy minimizing 
in this paper. Cream of the general frame of theory is to present and to prove basic principle of 
intelligent systems: entropy increases or decreases together with intelligence in the intelligent 
systems. The basic principle is of momentous theoretical significance and practical 
significance .From the basic principle can not only derive two kind of learning algorithms 
(statistical simulating annealing algorithms and annealing algorithms of mean-field theory 
approximation) for training large kinds of stochastic neural networks,but also can thoroughly 


dispel misgivings created by second law of thermodynamics on people's psychology ,hence 


make one be fully confident of facing life.Because of Human society, natural world, and even 
universe all are intelligent systems In particular, Population systems are intelligent systems. 
According to the basic principles of intelligent systems, The intelligence of population 
system is proportional to the volume of population space, and decreases with the 
logarithm of population density.. 
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1 Introduction 
Until now, human brain is the sophisticated creation of natural world and is only place where 
are produced cognition and intelligence, and hence spirit. Human being as a body carrying life is 
the most advanced and perfect intelligent system on the planet. It has the features common to 
general complex system. As we see it[1], a complex system is a functional system having any 
structure(including hierarchical and variably hierarchical structure) and consisting of any number 
of subsystems( = 1)capable of particular functions. These subsystems form closed loops of their 
own based on the feedback mechanism and are capable of self-adjustment according to different 
optimum criteria and interact with each other in various ways, and are composed of various 
dynamical, logical, nonlogical, and heuristic links. A complex system integrates at least the 
following six aspects: 
(1) Multi-dimensionality: magnitude of dimension, type of dimension ,and transition each other 
of the types of dimension; 
(2) Multi-parameter; 
(3) Multi-relationship: relationships between(among)variables at same level(in particular, 


z Having revisited content of the paper “Intelligent control with relative entropy minimizing (Control theory and 
applications, Vol.16,No.1,PP —27-31.In Chinese.)” ,l have proposed and proved the basic principle of intelligent 
system.And then further having supplemented and perfected,this article is formed.It is submitted first time in 
English. 


nonlinear relationships) and relationships between(among)variables at different level(in 
particular, nonlinear relationships)or the crossover of the two relationships; 
(4) Multi-criterion: multi-component, multi-scales(levels); 
(5) Multi-functionality: emergence, co-operation, competition, adaption, closed-openness, and 
so on; 
(6) Multi-discipline of knowledge used. 
Obviously, an intelligent system is complex system. 
Intelligent control is the process of performing a task by an intelligent machine[2]. The theory of 
intelligent control is a composition of mathematic, lingual method and algorithm used in the 
system and the process and is based on cross disciplines of neuroethology, cognition-psychology, 
computer science, systematic science, artificial intelligence and information science. Only when 
we are accustomed both to introduction and effective use of all knowledge of modern science , 
in particular, of the systematic science in scientific fields related to “human being” and to 
introduction and effective use of all scientific knowledge related to “human bring” in other 
modern science, in particular ,in the fields of systematic science, in other words, only when 
human being has full knowledge of itself, can an intelligent control system in true sense be 
obtained and can an the theory of intelligent system in true sense be established. 

Based on the point of view of neuroethology and cognition-psychology, general frame of 
theory for intelligent systems is presented by means of principle of relative entropy minimizing in 
this paper. Cream of the general frame of theory is to present and to prove basic principle of 
intelligent systems: entropy increases or decreases together with intelligence in the intelligent 
systems. The basic principle is of momentous theoretical significance and practical 
significance .From the basic principle can not only derive two kind of learning algorithms 
(statistical simulating annealing algorithms and annealing algorithms of mean-field theory 
approximation) for training large kinds of stochastic neural networks,but also can thoroughly 


dispel misgivings created by second law of thermodynamics on people's psychology ,hence 


make one be fully confident of facing life.Because of Human society, natural world, and even 
universe all are intelligent systems.Human intelligence is in the Brain, intelligence of universe is in 
Black holes. In particular,highest intelligence of universe is in the black hole which possesses 
maximum volume in all black holes. 
This paper is organized as follows:Section 2 presents the basic viewpoint and basic construction 
on intelligent systems,gives block diagram of intelligent control system.Section 3 presents and 
proves basic principle of intelligent system: entropy increases or decreases together with 
intelligence in the intelligent system.Sectiion 4 demonstrates how derive two kind of learning 
algorithms (statistical simulating annealing algorithms and annealing algorithms of mean-field 
theory approximation) for training large kinds of stochastic neural networks by means of the 
basic principle of the intelligent system. 
2 Basic viewpoint and basic construction on intelligent systems 

The following is a block diagram of intelligent system from the perceptive of neuroethology 
and cognition-psychology. 

It is a description of an abstracted framework of common intelligent control system, including 
the description of the hierarchical intelligent control system. We will give a brief description of 
the block diagram of intelligent control system shown in figure 1. 


The perception system makes characteristic abstraction, transformation, integration and 
coding of environmental stimuli. The coded environmental stimuli are input to the memory 
system. The memory is categorized as short-term memory and long-term memory. Information 
processing by short-term memory comes from two sources. On one hand, it processes the coded 
information from environment. At this time, the short-term memory shows up great selectivity: 
elimination of the unwanted or minor information and preservation and further acknowledgment 
of the needed information. On the other hand, the short-term memory refines some portion of 
the information stored in the long-term memory. The long-term memory is a huge information 
storage-base storing various kinds of information, such as sport skills, grammar information, 
semantic information, pragmatic information, value, processing procedure, etc. Migrating with 
current and past input, some portion of the long-term memory is activated. The activated 
information is called active memory [3].Some portion of the active memory undergoes a refining 
process in the short-memory which is the place where current cognitive activities take place. It is 
obvious that there must be a channel for information exchange between the short-memory and 
long-memory .The information exchange at the same time serves as a coordinator between the 
control system and reaction system. The control system itself has complete, multi-functional, 
high-level learning system. The control system determines how the entire intelligent control 
ssystem works. By means of direct learning from the environment and making use of the 
information stored in the memory system, the control system implements its organization and 
management function, processes the objectives and provides policy to achieve these objectives, 
that is, determines a plan and make a decision. The coordination system is an intermediate 
structure between the control system and other subsystems and makes decision and 
coordination based on short-term memory ,”to pass information from the higher level down to 
the lower level” and “to report information from the lower level to the higher level”. The reaction 
system controls the output of the entire intelligent system ranging from various actions to 
language and expression. 
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Fig.1 Block diagram of intelligent control system 


Basic principle in intelligent system 


Assume the intelligent system § shown in Figure 1 can be expressed as following 
triplet with a function 


S=(X,F,P,H (p,p) (1) 
Where X is the state space; F is a ø --algebra;P is a probability measure on 
(X, F); H(p, p) is relative entropy defined as 


HCP, p) = | p0 m EEP ay D 
P(x) 


Where Do (x, 8) is determined by observation, 0 =(&,77,t) is a parameter, where 


& isthe action rule of the intelligent system S ,77 is the event database ,i.e. event set 


stored in the memory system, and f¢ is time; p(x) is a function of maximum entropy 
probability density representing a priori knowledge and is given by the following 


equation 


1 
p(x) = Fe (x)) (3) 


Where U(x) isavector potential, m isa vector factorand Z is partition function. 
Whenever there is a new sufficiently large input to the system, the system executes 
a minimization process of formula (2) and when the process ends, the 0—1 symbol 


string supplied by the reaction system is the optimal task of the system. 
The principle of minimum relative entropy is well known.And here, just to give 


you a sense of what we're seeing. 
Theorem 1. Express the relative entropy in equation (2) as H(p,,p), we 


have 


lim H(p, x), PO“) =0. (4) 


Proof Becauseof p,(x,) is estimates to p(x,) which are obtained by any 


practicable method to do random experiment (learning) about certain constraint of 
p(x), for example, because of formula (3), there is a Markov chain of observable 


sample Xj eee X, whose limit distribution is p(x) on state space X .By means of 
ergodic theorem, from orbits x,(t),i = 12 sinis at > 0; of the Markov chain, estimate 
Po(x;) of density function p(x,) can be obtained. The process is producing directive 
sequence p(x) (i=12......) and its directive subsequence 


DF) , LZ) > 1,7 = 1,2,... oncomplete metricspace (D,d),D is set 
of probability density function on measurable space (X, F) .At this time , 
lim p,(x,) = lim p,(x,,,,) =lim p(x). 
i300 i>00 i>0 


In fact, suppose lim p,(i) =y, then for any small positive number £ there is 
100 


positive integer I , such that when i>Z, || p(x) -7 || <€.Because of lim 


i—> 0 


{ LG) } =œ ,thus there is a positive integer i, ,such that when 
i>i pL(i)>I ,at this time || Polo) 7 | <£ ,hence subsequence 


{ Po(Xra) } converges ,and lim p,(x,) =lim p)(x,,;)) = lim p(x;,)= n . The 
proof is completed. 


The proving process of above theorem is the process in which H(p, p) 


is minimized. The limit theory dictates that there must be a minimum value 
approaching to 0, and the minimum value is unique. The minimum value 
corresponds to the uniquely correct 0,1 character string for 
implementation of a given objective (environmental input). At control level, 
a 0,1 character string corresponding to the correct decision—making is 
given, and at the reaction system(level),a 0,1 character string 
corresponding to correct task is given. 


From the above discussion, we find the following proposition 


Proposition The sufficient and necessary condition for the 


intelligent system S=(X,F,P,H(p,,p)) to exist is that it can be given 


by 
S =(X,F,P,H(p,, p)) = Amin IPAE 2 a }, (5) 
Poy P(x) 
where A>O. 


Definition 1. Knowledge on the intelligent system § is defined as the 
structural information on the intelligent system § , measured by equation 
(2). 

Definition 2. The intelligence of an intelligent system is change ratio 
of the knowledge acquired by the intelligent system to the linkage 
coefficient between component units in the intelligent system. 

Theorem 2 (Basic principle of intelligent system) . Entropy increases 
or decreases together with intelligence on the intelligent system S. 

To make the discussion simple and sufficient, the concept of sufficiency 


of relative entropy is introduced [4]. 


Definition 3. For the intelligent systemS ,we assume 21 is O—algebra 


on set -7 of sub-vectors of x €X produced by its subsystem—memory system and 


control system, it is O—subalgebraof F, that is, 2% c F Let P(X) isset 


of the all probability measures on (X,F)and P c P(X). By O—subalgebra A 


is sufficient with respect to P is meant there exists 2 —measurable 
function 
h=E, (l, | XU ), u-ae,VueP (6) 
for any Ae F. 


Assume FP, and P are probability measures corresponding to probability 
density function p(x), p(x), respectively, {R , P}e P(X) .From the definition 


of sufficiency we easily find that 2 is sufficient with respectto {P,, P}.Therefore 
we have: 


Ha (Po P) =H; (Po, P) = HY: p). 


Since the 2 is sufficient with respect to {P,,P}, then the difference between 


FP, and P can be determined all by A alone. Therefore the conclusion concerning 


system S =(X,F,P,H(p,, p)) =Amin { foa, Om PAED ay }can be discussed 
Po z p(x 


on its subsystems—memory system and control system. 

Following this thinking in the discussion, we can not only prove the basic 
principle of the intelligent system §,but also produce a special algorithm 
—algorithm of mean—field theory approximation [5].The proof of theorem 2 is 
given in following 


Note that vector m in equation (3) denotes linkage coefficient between component 


units(nerve cells).For simplicity, we rewrite P)(x,@) as p (x) We will find 
OH, (py, P)/Om, in the following. 


OH, (P, Pp) — Po (Xp) p(x) 
Om, p(x) om, 


dx, (7) 


Where xp E€ -7 We have: 


PO) = [exp >, mU (xDU; (&)/ Zy — pla, exe), mU,(x,))U(x;)/Zy 


I 


= p(X, )U, r) - P(X) >| pU, x) 


Where Z,, is restrictionof Z on 7. 

Substitute above equation into (7) we have 

OH (Pos P) __¢ Por) PCT) g 
om, p(x) Om * 


= -f Po (Ca )U, (x )dx,, + Í Po (Xx; D. p(x U, (xX dx, ) 
Xp 


= E, (U, (x ))- E, (U, (x; )) (8) 
Because of E, (U,(x,))=H, —InZ, ,equation(8) reads 


ôH (Pos P) 
om 


i 


=H, (x,)—InZy —E,U, (7), 


Consequently we have: 


ma +E, 0,0) =H, 0) - Sake? (9) 


l 


Where H, (x,) is the maximum entropy determined by the observation sampling set or 
0 


training sampling set, and is the measurement of uncertainty of task executed by the memory 


system and the control system. Notice E(U;(x;)) is determinate real number. After the 
number of component units (nerve cells) of memory system and control system is determined, 
Zp (partition function) is determined as well. Therefore the left side of equation(9)can be 
regarded as a constant .If the first item at the right side increases(or decreases),the second item 


must increases(or decreases) .Since is sufficient for {P,P}; the above conclusion is 


completely applicable to the entire system §.The proof of basic principle is 
completed. 

Let's look at the population system. Population systems are intelligent systems. According 
to the basic principles of intelligent systems, The intelligence of population system is 
proportional to the volume of population space, and decreases with the logarithm of 
population density.. 


4 Intelligent behavior of intelligent system 


Intelligent behavior of the intelligent system S is accomplished by the subsystems— 
control system and memory system. The control system and memory system are systems 
of neural networks. Various intelligent behaviors are shown biologically as the 
change of linkage weight between nerve cells. 

Learning process is the process of acquiring knowledge .The learning process 
of system of neural network is divided into two phases. The first phase is a processes 
in which the system of neural systems abstracts the environment and establishes the 
to be solved problem model—model establishment process. The second phase is a 
process in which the problem is solved —namely uncertainty caused by environmental 
regularity in the system is minimized. Corresponding to the two phases, there are 
the following results. 

Theorem 3[6]. Assume the compounded system formed by system WN of neural network 
and the environment E that it is in is isolated from the outside world, and the 
learning process of the system of neural network is a Markov process that has a given 
transition probability function .When input information of sufficient magnitude is 
obtained, the system of neural network starts its learning process. The process will 
finally arrive at its the equilibrium from far from the equilibrium. The equilibrium 
is state when the system has maximum entropy. The sufficient and necessary condition 
of the maximum entropy is that the distribution density function of the system is 


given by 


p(x.) = exp, Uy) (10) 


N 


Where x, is state variable of the system of neural network, U(x,) is vector 


potential function and Z, is partition function. 


Theorem 4[6]. Under the condition of the theorem 3, assume the linkage weight 


between the i th and jth nerve cells of system of neural network is w; and 


Ww, = Wj , then based on basic principle of the intelligent system, we have 


H ; 1 
OH (PoP) yep _p’) ,where y=—— and k is Boltzmann constant, T 
ôw o kT 


i 
corresponds to the temperature ina physical system, but is a control parameter here; 


P 


j is the average probability when there is environmental input and the network 


arrives at equilibrium with both unit i and unit j conducting; Fs is the 


corresponding probability when there is not environmental input. 


Theorem 5[5]. Under condition of theorem 3, assume that the linkage weight between 


the ithand jth nerve cells of system of neural network is Wj and Wj = Wi» and that 


number of nerve cells of the system of neural network is sufficiently large, then 


based on the basic principle of the intelligent system and in the sense of mean square 


OH y (Po, p) 


ij 


limit, we find 


=U, (E, (x,))-U,(E,(x,)) where E, and E, denote 


expectation operators of P(x) and p(x), respectively. 


From theorems 4and 5 we find two kinds of learning algorithms for training stochastic 


neural networks: 
Aw, = B(P,-P;,) (11) 


Where Ø is a constant less than but approaching 1. 
A w; = (EL ELS, Ol Els, MEL OD (12) 


Where œ is a constant less than but approaching 1, and E[x,(k)] is given by 


E[x,(k)] = es, x(k) € {0,1} (13) 
L+exp(—*) 


Or 


-U, 
T 


E|x, (k)] = tanh( J; xX (k) €{1,-]} (14) 


Where U, = wx D ; E[x,(k)] is mean field when unit k is excited, Elx, (k)] 
7 


and E[x,(k)represent the excitement value of unit k when the system arrives at 


(3 2 


stable state in “-” phase( “minus” phase) and “+” phase( “ plus 


” 


a 


phase), respectively. “- ” phase — with input unit to be clamping by 
environment (input mode), the excitement value of both hidden unit and output unit 
evolves to its stable value according to the rule expressed in equation (13) or (14); 
“+” phase —with the input unit yet to be clamping by environment (input mode) and 
the output unit to be clamping by the required output mode, only the hidden unit 
changes its excitement value till its stable value based on the same rule 


Directly applying learning algorithms (11) or (12) to the training of the system 


of neural network is exactly to use the fastest declining method to find the Wj 


when H,(py),p) approaching minimum, which will cause the system of neural system 


to be trapped into the local minimum, but in some cases, it is undesirable and 
sometime it is prohibited. Annealing algorithm is a method to avoid the local minimum 


and to obtain the overall minimum. 


1)Statistic simulation annealing algorithm 

Statistic simulation annealing algorithm[7] as a common method for 
approximately solving problem in large-scale combinatory optimization has seen 
great development in theory and practice. This algorithm can be widely used in the 
optimization field to provide solution to various optimization problems and the 
solution can arbitrarily approach to the overall optimization. This algorithm allow 
cost function related to state to climb the slope randomly simulating the metal 
annealing process. That is, with the function of the control parameter T , random 
noise is added to the conventional fastest decline process, thus if “unfortunately” 
the system of neural network runs into the local minimum trap, it will be able to 
get out of this local minimum trap until it maximally approaches the overall optimum, 


in other word, obtains the overall minimum of the cost function. In this article, 


the cost function is the relative entropy and the minimum process of relative entropy 
is synchronized with that of energy function—const function defined in other 


articles. The procedures for computer implementation of the statistic simulation 


annealing algorithm are: 


(1) Decide the sufficiently high initial T and randomly determine the 


linkage weight between nerve cells of the system. 


(2) Givena small perturbation to initial state Xr of the system, we find 


Xr and the relative entropy increment A Hy. 


(3) If AH, <0, then accept this change , or otherwise if exp( -AH,) 


1 


>random number €[0,1), then accept this change and oai =x : 


(4) Compute new temperature: T@+1)=7(0)/InGi+),i=1, where T(O) is the 
initial temperature. 
(5) Repeat steps (2) through to (4) until T approaches zero and the system 
no longer makes state transfer. 
2) Annealing algorithm of mean-field theory approximation 
The above statistic simulation annealing algorithm “inherits ” the connatural 
essentiality of the Mote Carlo method —slow convergence rate and high computation 
complexity, which greatly restricts its application. For this reason, in recent 
years, specialists and scholars have put forward annealing algorithm of mean-field 
theory approximation, which has yielded good result. All methods for deducing 
annealing algorithm of mean-field theory approximation are based on the reduction 
of the system free energy in statistic physics. Our deduction[5] of this algorithm 
from the perceptive of relative entropy minimization is determined by the basic 
principle of the intelligent system and is a natural extension and result of the 
thinking — using relative entropy minimization to implement the intelligent 
control. 
The annealing algorithm of mean-field theory approximation is as follows: 


(1) Randomly choose a high-value parameter T and randomly initialize the mean 


field E[x,(k)] and linkage weight wy of all free units. 


(2) Do following cycle : 
a) Randomly choose a mean-field variable E[x,(J)],lJ=1----- ,r, and compute 


according to equation (13) or (14) till to arrive at stable state to find 
Elx 0]. 


b) Decrease T repeat step a)until a stable solution of equation (13) (or (14)) 


appears. 
(3) Implement the above procedures on “—” phase and “+” phase, respectively. 
(4) Modify weight according to equation (12). 

The annealing algorithm of mean-field theory approximation has a convergence 


rate about 50~100 times higher than that of statistic simulation annealing 
algorithm and declines the time complexity from O(2")to O(n),where n is the 


number of nerve cells in the system of neural network. 

The above two learning algorithm can be used to train large kinds of stochastic 
neural network, for example, Markov neural networks[6],neural network based on 
mean-field theory approximation[5], and so on. Our system of neural network in the 


intelligent system has learning function[6],memory function[8] and thought 


function[9]. Having constructed the intelligent system , having given mathematical 
method analyzing it and its algorithm, we have constructed theory of intelligent 


system, which is implemented through relative entropy minimization. 
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